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ABSTRACT 

The Sloan Digital Sky Survey (SDSS) and the Two Micron All-Sky Survey (2MASS) provided the first 
deep and global photometric catalogues of stars in our halo and clearly mapped not only its structure but 
also demonstrated the ubiquity of substructure within it. Future surveys promise to push such catalogues to 
ever increasing depths and larger numbers of stars. This paper examines what can be learnt from current and 
future photometric databases using group-finding techniques. We compare groups recovered from a sample 
of M-giants from 2MASS with those found in synthetic surveys of simulated ACDM stellar halos that were 
built entirely from satellite accretion events and demonstrate broad consistency between the properties of the 
two sets. We also find that these recovered groups are likely to represent the majority of high-luminosity 
(L > 5 X lO^i©) satellites accreted within the last 10 Gyr and on orbits with apocenters within 100 kpc. 
However the sensitivity of the M-giant survey to accretion events that were either ancient, from low-luminosity 
objects or those on radial orbits is limited because of the low number of stars, bias towards high metallicity 
stars and the shallow depth (distance explored only out to 100 kpc from the Sun). We examine the extent to 
which these limitations are addressed by current and future surveys, in particular catalogues of main sequence 
turn off (MSTO) stars from SDSS and LSST, and of RR Lyrae stars from LSST or PanSTARRS. The MSTO 
and RR Lyrae surveys are more sensitive to low luminosity events (L ^ XQ^Lq or less) than the 2MASS M- 
giant sample. Additionally, RR Lyrae surveys, with superior depth, are also good at detecting events on highly 
eccentric orbits whose debris tends to lie beyond 100 kpc. When combined we expect these photometric 
surveys to provide a comprehensive picture of the last 10 Gyr of Galactic accretion. Events older than this are 
too phase mixed to be discovered. Pushing sensitivity back to earlier accretion times would require additional 
dimensions of information, such as velocity and metallicity or abundance measurements. 
Subject headings: galaxies: halos - structure- methods: data analysis - numerical 



1. INTRODUCTION 

Over the last two decades, our view of the stellar halo has 
changed from a diffuse, featureless cloud of stars surrounding 
the Galaxy, to one crossed by many large-scale fe atures such 
as the tidal tails of t he Sagittarius d warf galaxy (llbata et alJ 
11994 1995; Maie wski et al] l2003h . the Virgo overden- 
sity CJuric et al. 2008) the Triangulam-Andromeda structure 
jRpcha-Pinto et al. 2004; Majewski et al. 2004; Martin et al. 
20071) and the low latitude Monocerous ring (Newberg et al. 



200a lYannyetaD 12003.) . The mapping of these low sur- 



face brightness structures can be attributed to the advent of 
large scale surveys such as the Two Micron All Sky Survey 
(2MASS) and the Sloan Digital Sky Survey (SDSS) that have 
large numbers of stars in their catalogues. Future surveys. 



such as GAIA (PeiTvmah 2002), LSST (Ivezic etalJl2009^ 
SkyMapper (.Keller etaL,2007.) and PanSTARRS, will map 
the stellar halo in ever more detail. 

The presence of these structures lends support to the ACDM 
model of galaxy formation in which the stellar halo is built 
up, at least in par t, hierarchically thr ough mergers of smaller 
satellite systems (iBullock et alJl200"li) . However, simulating 
the stellar halo in a cosmological context to test this picture 
is a challenging task for two reasons. First, the stellar halo 
is intrinsically faint, containing only about 1 per-cent of the 
total Milky Way stars. Hence, to simulate the faint structures 
within the stellar halo with adequate resolution requires enor- 
mous computation power. For example, if a satellite with a 
stellar mass of 10^ M© is simulated with at least 10'^ particles, 
then to simulate a whole galaxy having stellar mass 10^^ 
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with the same mass resolution will require a simulation with 
more than 10^ stellar particles. Second, the physics of star 
formation and its feedback effects are complex phenomena 
that have not yet been fully understood. 

In recent years progress has been made in tackling both 
these issues. Hydrodynamical simulations of galaxies includ- 
ing star forrnation and feedback recipes are now being done 
jAbadi et al."2003': 'Robertson et al .l 12004 iBr ook et al.^ '20041; 
Governato et al. 2007; Scannapieco et alj200 8; Zolotov et al. 
2009). However, the highest resolution simulations only have 
a stellar mass resolution of 10^ — 10^ M0, which is not 
enough to resolve the features in the stellar halo. Alterna- 
tively, assuming that the stellar halo is built up entirely by 
means of accretion events, hybrid techniques have been de- 
veloped that use collisionless simulations to follow the dy- 
namical evolution of stars in an analytical potential and a 
semi-analytical prescription to incorporate star formation pro- 
cesses (Bullock & Johnston 2005). Although these hybrid 
techniques are not fully self-consistent they can produce re- 
alistic models of the stellar halos with very high resolution 
and can resolve even the lowest luminosity structures. Recent 
improvements in hybrid techniques have implemented the 
semi-analytical star formation recipes directly into cold dark 
matter simulations (SJetan l2009t iDe Lucia & Helmil 120081; 
ICooper et al.ll2010h . so th at with current dark matter simu- 
lati ons like Via Lactea-I I dPiemand et al.l l2008l) and Aquar- 
ius ("Sprin gel et alj|2008l) . reaching a resolution of over 10^ 
dark matter particles within the virial radius, the future looks 
promising. 

With the tremendous progress in both theory and obser- 
vations it is now possible to compare the two quantitatively. 
Since substructures in a system are fluctuations in the density 
field one way to make this compa rison is to analyze the statis- 
tics of fluctuations. For example. iBell et aP (12008 ) looked at 
the fractional root mean square deviation of the stellar den- 
sity from a smooth triaxial model in the main sequence turn 
off stars (hereafter MSTO) of SDSS data. They compared the 
radial dependence of these deviati ons with those apparent in 
the 1 1 simulated stellar halos of Bullock & JohnstonI ( l2005h 
and found the observations and simulations to be in rough 
agreement. One could in principle also do a Fourier analy- 
sis of the fluctuations or study the angular correlation func- 
tion. While such analyses are useful for studying the global 
statistical properties of structures they cannot say much about 
individual structures. Also when calculating the rms devia- 
tions it is not clear if the deviations are due to structures or is 
it due to the inability of the analytical model to describe the 
smooth component of the halo. 

An alternative approach to comparing fluctuations appar- 
ent in observed and simulated stellar halos is to use group 
finding or clustering algorithms. The strength of the group- 
finding approach for this particular problem is that, in addition 
to simple comparisons of group properties, the results have 
a direct and si mple physical interpretation. It was shown in 
iJohnston et alj ((2008) that if substructures standing out above 
the background density can be identified (i.e. as groups) then 
the distribution of the properties of these structures can in 
principle be related to a galaxy's accretion history in terms 
of the characteristic epoch of accretion and the mass and or- 
bits of progenitor objects. Hence, group finding can be used 
to constrain the accretion history of our Galaxy and compare 
it to general expec tations of the ACDM paradigm. 

For example, in 'Sharma et al. (2010) we applied a hierar- 
chical clustering algorithm (Sharma & Johnston ,2009.) to a 



sample of M-giants selected from the 2MASS catalogue and 
recovered sixteen structures, many of which were previously 
known and associated with individual accretion events. In this 
paper, we go on to compare the properties of these recovered 
structures with those found in synthetic surveys - designed to 
mimic the observations in the 2MASS catalogue - of si mu- 
lated stellar halos (taken from lBullock & Johnston|[2005h . in 
order to assess how similar they are. We then ask what type of 
accretion events (in terms of satellite accretion time, luminos- 
ity and orbit) are recovered in the synthetic surveys, in order 
to assess how sensitive we expect the 2MASS survey be to 
our Galaxy's own history. 

The group finder is also applied to synthetic surveys of 
the simulated stellar halos chosen to mimic current and near- 
future photometric catalogues of other stellar tracers. For ex- 
ample, SDSS, which has a magnitude limit of r < 22.5, was 
able to map the stellar halo with MSTO stars out to 36 kpc 
with about 4 miflion MSTO stars ( Bell et al. 2008). A sim- 
ilar MSTO generated from LSST - which will observe the 
sky in six photometric bands, ugrizy, with a single visit lim- 
iting magnitude of r ^ 24.5 (or r ^ 27.5 using co-added 
maps) - will be capable of probing the halo out to 100 kpc 
with more than 20 million stars. In addition, both LSST and 
PanSTARRS will explore the transient optical sky and should 
be able to detect variable RR Lyrae stars out to 400 kpc and 
beyond, which is more than the expected edge of the stellar 
halo. 

For group finding we use the code EnLink 
(fSh arma & JohnstonI [2009 '). which is a density-based hi- 
erarchical group-finder As noted in S harma et al. (2010j) . 
this code is ideally suited for this application for four reasons. 
First, structures in the stellar halo have arbitrary shapes 
and a density-based group-finder is able to identify just 
such irregular groups. Second, while many density-based 
group-finders consider only groups formed above a fixed 
isodensity contour, EnLink's clustering scheme can identify 
groups at all density levels. This is essential because stellar 
halo structures span a wide range of densities that cannot 
be separated by a single isodensity contour. Third, halo 
structures can have nested substructures and EnLink's orga- 
nizational scheme allows the detection of this full hierarchy 
of structures. Finally, the group-finder gives an estimate of 
the significance of the groups, so spurious clusters can be 
ignored. 

The paper is organized as follows. Section |2] describes the 
data sets used in the paper Section [3] discusses the methods 
employed for analyzing the data — i.e., group finding. In Sec- 
tion m we ask how much any photometric survey can tell us 
about substructures in the stellar halo by looking at an ideal- 
ized survey, simulated in the absence of observational errors. 
In Section|5]we look at groups derived from simulated surveys 
constructed under more realistic circumstances. In Section|6] 
we discuss the implications of these results for groups recov- 
ered from current surveys of M-giant stars (from 2MASS) and 
MSTO stars (from SDSS) in the real Universe. Finally we 
summarize our findings in Section|2l 

2. DATA 

2.1. The 2MASS M-giant data 

We use a catalog of 59392 M-giant s identified from the 
2MASS survey in a companion paper ISharma et al.1 (l20Tol) 
(Paper I). A brief description of the catalogue and the selec- 
tion procedure is repeated below. The 2MASS all sky point 
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source catalog contains about 471 million objects (the major- 
ity of which are stars) with precise astrometric positions on 
the sky and photometry in three bands J, H, Kg. The survey 
catalog is 99% complete for Kg < 14.3. An initial sample of 
candidate M-giants was generated by applying the selection 
criteria: 

lO.O<Ks<U.O (1) 
J-Ks> 0.97 (2) 
J-i7<0.561(J-i\:s)+0.36 (3) 
J - H>Q.bQl{J - Ks) + Q.l9 (4) 
i<'ssin(6)>14.0sin(15°). (5) 

All magnitudes in the above equations are in the intrin- 
sic, dereddened 2MASS system (labeled with subscript 
hereafter), with dereddening applied using the Schlegel et al] 
(119981) extinction maps. These selection criteria and 
the dereddening me thod are similar to those used by 
iMaiewski et alj (l2003h to identify the tidal tails of Sagittarius 
dwarf galaxy. 

Distance estimates for these stars were made by assum- 
ing a uniform metallicity of [Fe/H] — —1.0 and an age 
of 13.1 Gyr and calculating a linear fit to the color magni- 
tude relation of the giant s tars using a theoret i cal isochrone 
from the Padova database jBertelli et al.lll994t iMarigo et alj 
|2008; Bonatto et al. 2004). The best fit relationship is Mk = 
3.26 — 9.42( J — A's)o where Mk is the absolute magnitude 
of the star and (J — Ks)q its color Using this relation the 
distance modulus is then given by 

A* = (i^s)o- 3.26 + 9.42(J-if,)o. (6) 

The specific values of age and metallicity were adopted be- 
cause a) they roughly correspond to the expected values for 
the stellar halo and b) they lead to the identification of the 
largest known stellar halo structure in the data, the tails of the 
Sagittarius dwarf galaxy, with maximum clarity. 

2.2. Simulations 

To make theoretical predictions of structures in the 
stellar halo, we use the eleven stellar halo models of 
iBullock & Johnstonl (12005 ). which were simulated within the 
context of the ACDM cosmological paradigm. These simu- 
lations follow the accretion of individual satellites modeled 
as A^-body particle systems onto a galaxy whose disk, bulge, 
and halo potential is represented by time dependent analyt- 
ical functions. Semi-analytical prescriptions are used to as- 
sign a star formation history to each satellite and a leaky 
accreting box chemical enrichment model is used to calcu- 
late the metallicity as a function of age for the stellar pop- 
ulations (|Robirtionitan|200l |^nm The three 
main model parameters of an accreting satellite are the time 
since accretion, tacc, its luminosity, Lgat, and the circularity 
of its orbit, defined as e = J/Jdic (J being the angular mo- 
mentum of the orbit and Jcirc the angular momentum of a 
circular orbit having same energy). The distribution of these 
three parameters describes the accretion history of a halo. To 
study the sensitivity of the properties of structures in the stel- 
lar halo to accretion history, we additionally use a set of six 
artifi cial stellar halo model s (referred to as non- ACDM halos) 
from lJohnston et aT] (l2008h that have accretion events that are 
predominantly (i) radial (e < 0.2), (ii) circular (e > 0.7), 
(iii) old (iacc > 11 Gyr), (iv) young (tacc < 8 Gyr), (v) 
high-luminosity (Lgat > 10^ L©), and (vi) low-luminosity 
(Lsat < 10^ L©). 



2.2.1. Generation of synthetic surveys from simulations 

Each A^-body particle in the simulations represents a pop- 
ulation of stars having a certain stellar mass, a distribution 
of ages and a monotonic age-metallicity relation. The real 
data from surveys on the other hand are in the form of color- 
magnitude combinations of individual stars, shaped by obser- 
vational or other selection criteria. The procedure to convert a 
simulated stellar halo model to a synthetic survey consists of 
four steps Q 

• Spawning the stars: We use thel Chabrieii(l200Tl) expo- 
nential initial mass function (IMF) to generate stars cor- 
responding to the given stellar mass of an A^-body parti- 
cle. If M is the stellar mass associated with the A^-body 
particle and rrimoan the mean mass of the IMF distribu- 
tion then the number of generated stars is given by ap- 
proximately Af /rrimoan- We use stochastic rounding to 
convert M/TOmcan to an integer — i.e., if the fractional 
part of M/minoan is less than a Poisson distributed ran- 
dom number with a range between and 1, we incre- 
ment the integral part by 1 . Each generated star is as- 
signed a mass that is randomly drawn from the IMF, an 
age that is randomly drawn from the given age distribu- 
tion and metallicity computed from the age-metallicity 
relation. 

• Assigning color and magnitude: Next, a finely spaced 
grid of isochrones obtained from the Padova database 
jBertelli et al. 1994; Maiigo et al. 2008; Bonatt o et aP 
12004; Girardi et al. 2004) is interpolated to calculate the 
color and magnitude of each generated star The abso- 
lute magnitude is converted to an apparent magnitude 
based on the distance of the A^-body particle from a 
given point of observation. In this paper we assume the 
point of observation to be at 8.5 kpc from the Galac- 
tic Center in the plane of the Galactic disk. Finally, 
depending upon the color magnitude limits of the sur- 
vey being simulated, the stars are either accepted or re- 
jected. 

• Distributing the stars in space: The position coordinates 
of first star that is spawned by an A^-body particle, is as- 
sumed to be same as that of the spawning particle. Each 
subsequent star spawned by the A^-body particle is dis- 
tributed in a spherical region around the particle. The 
radius R of the spherical region is determined by the 
A^-body particle's 32nd nearest neighbor in (X, Y, Z) 
position space. For this the multi-dimensional density 
estimation code EnBiD ( Sharma & Steinmetz 2006) is 
used. The stars are radially distributed according to 
the Epanechnikov kernel function (1 — r'^)r'^ where 
r' = r/R and r' varies from to 1. A surface density 
map in X-Z plane of one of the ACDM halos sampled 
using the above scheme is shown in Figure [T] It can be 
seen that the structures are quite adequately reproduced. 

• Observing our stars: In simulations the distance of the 
stars is exactly known but in real surveys the distance is 
estimated by using observed properties of the selected 
stars. We use the same procedure to estimate the ob- 
served distances for the stars in our synthetic surveys. 

' a code named Galaxia was developed for this, details of which will be 
presented in a future paper 
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TABLE 1 

Synthetic survey data sets generated from simulations. 



Name 


Type 


Color limits 


Apparent 
Mag 


distance 
limit kpc 


f sample 


o-^ 


Stars 




Groups 


Purity 


SI 


Ideal 


0.1 < - r < 0.3 


r < 27.5 


363 


0.25 


0.0 


~ 1.1 X 


lo' 


62.4 


0.82 


S2 


2MASS 


(J - K,)„ > 0.97 


{K,)„ < 14 


94.5 


I.O 


0.34 


~ 4.7 X 


10* 


8.4 


0.68 


S3 


MSTO 


0.1 < g - r < 0.3 


r < 24.5 


100 


0.25 


0.64 


~ 9.1 X 


10" 


9.4 


0.69 


S3' 


MSTO 


0.1 < g - r < 0.2 


r < 24.5 


100 


1.0 


0.58 


~ 5.9 X 


10" 


15.0 


0.66 


S4 


RR Lyrae 




r < 24.5 


600 




O.ll 


~ 6.2 X 


10* 


7.9 


0.69 


S5 


SDSS(8000 sqd) 


0.1 < 3 - r < 0.3 


r < 22 


30 


1.0 


0.61 


~ 1.7 X 


10" 


2.3 


0.39 



Note. — Each data set contains synthetic surveys of seventeen stellar halos, of which eleven are with ACDM accretion history and six are with non-ACDM accretion history. The 
reported number of groups and purity are average over 1 1 ACDM halos. 



2.2.2. Synthetic surveys and data sets 

We use the simulations and the procedure described above 
to generate five different types of synthetic surveys, namely 
SI, S2, S3, S4 and S5 (summarized in Table[T]). Each data set 
consists of surveys of seventeen stellar halos, eleven of which 
correspond to halos having ACDM accretion history (ACDM 
halos). The remaining six correspond t o art ificial accretion 
histories (non-ACDM halos see Section |272] | and are named 
as radial, circular, old, young, high luminosity and low lumi- 
nosity halos to reflect the properties of the dominant accretion 
events in them. Below we briefly describe these data sets and 
discuss their properties. 

• Data set SI — "ideal" surveys 

The first data set, SI, uses a color range of 0.1 < 
g — r < 0.3 (color from SDSS ugriz bands) to select 
MSTO stars and has an apparent magnitude limit of 
r < 27.5. The color and magnitude limits were chosen 
to encompass the bulk of the stellar halo MSTO stars 
(since the main sequence stars have a typical absolute 
magnitude of 4.7 this corresponds to a distance limit of 
^ 363 kpc for the survey). This data set is designed 
as an idealized test case and hence we assume that the 
distances to the stars are exactly known. A comparison 
of the results of the group-finder applied to this ideal 
data set with those from more realistic surveys allows 
us to distinguish the limitations to sensitivity imposed 
by physical effects from limitations that are due to ob- 
servational constraints. For computational ease we sub- 
sample the data by a factor of 0.25, which results in a 
sample size of about 10^. In AppendixlAlwe show that 
this sub-sampling has negligible impact on the analysis 
presented here. 

• Data set S2 — 2MASS M-giant surveys 

The second data set, S2, mimics the 2MASS M-giant 
sample. Color and magnitude limits of (J ~ Ks)o > 
0.97 and (if,)o < 14 (2MASS JHK band) are used to 
genera te the sample. These limits are similar to those 
used in lSharma et alj (120101) to identify structures in the 
stellar halo from 2MASS M-giants. A magnitude limit 
of {Ks)o < 14 means the data explores distances out to 
- 95 kpc. 

The distance modulus fi is calculated from the apparent 
magnitude of the star and its color using the relation 



M= (i^,)o- 3.26 + 9.42(J-/f,)o, 



(7) 



which corresponds to an is ochrone of age 13.1 Gyr and 
metallicity [Fe/H]=-1. See lSharma et alJ (12010^ for fur- 
ther details. However, this color-distance relationship 



is accurate only for a particular metallicity. In real sys- 
tems there is a range in the metallicities of the stars, 
so the distance estimated using equation ^ has some 
uncertainty associated with it. In order to evaluate the 
size of this error we calculated the dispersion about the 
actual value /i of stars separately for each satellite sys- 
tem in the eleven ACDM halos and then computed cr^ 
as the mean value of this dispersion over all the satel- 
lites. (This approach avoids the computed value of cr^ 
being dominated by the largest satellite system in the 
halo.) We found = 0.34 mag, in agreement with 
the estimates of IMaiewski et al.l (12003) where they re- 
port (T^ = 0.36 for the 2MASS M-giants in the core of 
Sagittarius. This suggests that we can expect an uncer- 
tainty of about 18 per-cent for the distance of M-giants. 

• Data sets S3 and S3' — MSTO surveys 

The third data set S3 mimics an F and G-type MSTO 
star survey like the one possible with LSST, assuming a 
magnitude limit of 24.50 (which corresponds to a dis- 
tance limit ^100 kpc) and a color range 0. 1 < 5 — r < 
0.3 chosen to match our "ideal" data set SI. For com- 
putational ease, like data set SI, data set S3 is also sub- 
sampled by a factor of 0.25 (see AppendixlAlfor further 
details). We generate a second set of MSTO surveys, 
S3', with a more limited color range 0.1 < g ~ r < 0.2 
in order to explore how our results depend on our se- 
lection criteria. The photometric error was assumed to 
vary as 



Am,. 



(0.04 - 0.039)a; + 0.039x2 



(8) 



where x = 10. 0°-^^'"'' '"^^ an d = 24.5, in ac cor- 
dance with LSST specifications llvezic et al.l (l2009h . 

The distances for these data sets are estimated from 
the r-band apparent magnitude using the relation 
fi = nir — 4.7 for the distance modulus. We estimated 
(7^ for S3 and S3' using the same approach as for the 
S2 data set, and these are tabulated Table [T] For in- 
dividual satellite systems cr^ was also found to have a 
dependence on the mean metallicity of the systems - 
for color range 0.1 < g — r < 0.4, with metallicity 
[Fe/H] decreasing from -1 to -1.6, was found to in- 
crease from 0.7 mag to 0.82 mag. For [Fe/H] below 
— 1.6, cTp was roughly constant at a value of 0.82 mag 
but there was a scatter of about 0.06 mag. Hence for 
[Fe/H]=-2 our estimated value of cr^ — 0.82 ± 0.06 is 
in agreement with ct^ = 0.9 of MSTO stars reported by 

^ This magnitude limit corresponds to the single visit depth of LSST: the 
co-added depth of LSST can reach up to 27.5 
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(iBell et al.l2008l) for globular cluster systems with color 
range 0.2 < g - r < 0.4 and metallicity [Fe/H] ^ -2. 

Note Bell et al. ( 2008) while analyzing the stellar halo 
using SDSS had used a color limit of 0.2 < g - r < 0.4 
to isolate the MSTO stars. However, the spread in 
absolute magnitude of MSTO stars is known to in- 
crease with increase in g ~ r color Hence to improve 
the distance estimation we here adopt a color limit of 
0.1 < g - r < 0.3 (data set S3), which is shifted by 
0.1 m ag bluewards as compared to that of iBell et al.l 
(12008 ). One can possibly extend the blue limit even 
more bluewards, but there are very few MSTO stars 
with g — r < 0.1, moreover, one increases the chance 
of contamination from non MSTO stars. 

Since we use a single absolute magnitude to estimate 
the distance, small features in the color versus absolute 
magnitude diagram can generate duplicate groups for 
some very luminous structures. One such feature in the 
0.1 < g — r < 0.3 color range is due to the horizontal 
branch stars. To avoid this we restrict our analysis to 
stars with absolute magnitude Mr > 1. 

• Data set S4 — RR Lyrae surveys 

The fourth data set, S4, mimics an LSST RR Lyrae 
survey and is generated by identifying the horizontal 
branch stars that intersect the instability strip in the lu- 
minosity temperature diagram. The blue edge of the 
instability strip is given by equation 

logTsB = 3.999 - 0.079 log(L/ L©) 

+0.056 log(M/ Mq) + O.Oer (9) 

from ICaputo et al.l (119871) and we assume the helium 
fraction to be F = 0.2409 and the mean mass of 
RR Lyrae to b e M = 0.645 Mfr^, similar to the val- 
ues adopted by iCatelan et al.l (1200 4*) for their theoreti- 
cal studies. The width of the strip Alog(re//) is as- 
sumed to be 0. 09, which is slightly higher than the val- 
ues adopted bv ICatelan et al.l d2004.) . The range of lu- 
minosity was assumed to be 1.4 < log(L/ Lq) < 2.0. 
To avoid contamination with main sequence stars the 
mass was assumed to lie in a mass range of 0.5 < 
log(Af/ Mq) < 1.0. Next, to match the number 
of stars in the survey to the expected number of RR 
Lyraes in the Milky Way we set the sampling fraction 
to /sample — 2. This generated about 6 x lO'' stars with 
a density of about 1 .5 per square degre e in accordance 
with the results of Ivezic et al. (2005) on SDSS DRl 
data and Vivas eLal. (2004) on QUEST survey (about 1 
to 1 .3 per square degree). The apparent magnitude limit 
was assumed to be rn^ = 24.5, which is within the de- 
sign Umits of LSST and PanSTARRS. The RR Lyrae 
stars were found to have a mean absolute magnitude of 
about Mr ~ 0.55, which implies a distance limit of 
600 kpc for the survey — beyond the outer edge of he 
stellar halo. 

For our analysis we assume a distance uncertainty 
of 5 per-cent for the RR Lyraes (i.e. scattering the 
true distances of stars in our surveys with a disper- 
sion ar/r = 0.05). This value may be slightly opti- 
mistic given the present state of the art measurements 
but probably within reach in future. If the period and 



metallicity of an RR Lyrae can be accurately mea- 
sured then its dista nce can be estimated to b etter than 5 
per-cent accuracy ( Caceres & Catelarill2008l) . However 
[Watkins et al. (2009) report a distance measurement of 
about 8 per-cent accuracy for RR Lyraes in SDSS using 
a technique that utilizes the light-curve shape itself to 
estimate metallicities. 

• Data set S5 — SDSS survey The last data set mim- 
ics an MSTO catalog from the Sloan digital sky sur- 
vey (SDSS). Like data set S3 it uses a color limit of 
0.1 < g - r < 0.3 to isolate the MSTO stars. The 
magnitude limit of the survey was set to r < 22.0 
and it covers an area of 8000 square degrees towards 
the north galactic cap. The photometric error was as- 
sumed to vary as described by equation dH) but with 
7715 = 22 .6, wh ich roughly reproduces the SDSS errors 
(Fig-7 in lJuriTet al . (200l)). 

3. METHODS 
3.1. Group finding 

In this paper we use the hierarchical group-finder EnLink 
that can cluster a set of dat a points defin ed over an arbitrary 
space (described in detail in lSharma & Jo hnston 2009). Paper 
I contains a complete discussion of the optimum parameters 
for the group finder, as well as the choice of coordinate system 
for the data. We include a brief outline of each of these below. 

EnLink identifies the peaks in the density distribution in the 
data-space as group centers. The region around each peak, 
which is bounded by an isodensity contour corresponding to 
the density at the valley, is associated with the group. The 
number of neighbors employed for this density estimation was 
chosen to be fc = 30: a smaller value makes the results of 
the clustering algorithm sensitive to noise in the data, while a 
larger value means that small structures go undetected. 

EnLink is built around a metric that can automatically adapt 
to a given data set, and hence can be applied to spaces of 
arbitrary numbers and types of dimension. However, it was 
shown in Paper I that the most effective way to deal with pho- 
tometric data sets where the angular position of stars is well 
known, but the distance can be very uncertain is to transform 
to a Cartesian coordinate system and use a simple Euclidean 
metric. Prior to the transformation, a new distance estimate 
r' = 5(log(r/(10pc))) — /io is adopted, where fiQ is a con- 
stant that determines the degree to which the radial dimension 
is ignored or used. The optimum value of fiQ was found to be 
8. Note, the clustering results were not overly sensitive to the 
exact choice of /iq. 

A group finding scheme in general also generates false 
groups due to Poisson noise in the data. In order to make 
meaningful comparison between different data sets with dif- 
ferent number of data points it is imperative that one uses 
a consistent and objective scheme to screen out spurious 
groups. In EnLink spurious groups that can arise due to Pois- 
son noise in the data are screened out by looking at the signif- 
icance 

^ ^ In(pinax) - In(pmin) ^^^^ 

The contrast between the density at the peak (pmax) and valley 
(Pmin) where it overlaps with another group can be thought of 
as the signal of a group and the noise in this signal is given by 
the variance ^{^^{p} associated with the density estimator. An 
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Fig. 1. — Plots showing the distribution of stars and groups in the X-Z 
space for one of the simulated stellar halo of data set S 1 . Top panel: surface 
density map of stars in the halo. Lower panel: a plot of stars which have been 
identified as groups by the group finder. Different colors in the plot represent 
different groups. The black filled circles denote the point of peak density 
within a group. 

advantage of such a definition is that the resulting distribution 
of significance of spurious groups is almost a Gaussian and 
hence the cumulative number of groups as function of signif- 
icance can be written as 



Gi> S)= [1- erf 



Sfi 
V2 



(11) 



where N is the number of data points and k the number of 
neighbors used for density estimation and /i and J2 are con- 
stants for a given k and the dimensionality of data d. We 
adopt a threshold S — S^h for each of our synthetic surveys 
(depending on the number N of stars in the survey) that sat- 
isfies the condition G(> 5Th) = 0.5. This means that the 
expected number of spurious groups in our analysis is 0.5. In 
EnLink for a chosen 5Th, all groups with S < S'xh are de- 
nied the status of a group and are merged with their respective 
parent groups. 

A value of /i = 1.0 and /2 = 15.5/(d'^-^k ^-^) were empir- 
ically derived bv lSharma & JohnstonI (l2009h and shown to be 
valid for a wide range of cases. An improved version of the 
formula with /i = ^(<i/4.0)(l - 2.3/k) and = 0.4 was 



10-^ 
10-^ 

A 

a 10"= 
10"^ 
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Fig. 2. — Cumulative distribution of significance of groups for different 
data sets and its comparison to the expected distribution for a structure-less 
halos (a Gaussian and a Hernquist sphere). Group finding analysis is done 
in the modified radial coordinates, except for the Gaussian sphere which is 
analyzed in the normal co-ordinates. The adopted analytic relation is shown 
as the dashed line. The symbols on the plot mark average (over 1 1 ACDM) 
significance threshold that is adopted for each data set. Only groups 
with S > 1 are shown in the plots. 



presented in iBland-Hawthorn et alj ( 120101) . But these formu- 
las were derived for the case where the data is generated by a 
homogeneous Poisson process and are reasonably accurate for 
most applications. For non-homogeneous data the presence of 
density gradients on large scales makes the formula slightly 
inaccurate. Hence for greater precision one should derive the 
values of /i and /2 for a model data whose large scale distri- 
bution is similar to the data being studied but otherwise does 
not have any substructures in it. We consider two models here 
a) a Gaussian sphere b) a Hernquist sphere with a scale radius 
of 15 kpc, which was shown to be an appro priate descrip- 
tion of the stellar halo on large scales by Bul lock & Johnsto"nl 
( 12005b . The sample size of models was set to 10^. A value of 
/i = 0.93 and /2 = 0.4 was found to fit the distribution of 
significance of spurious groups and this is shown in Figure|2l 
The distribution of significance averaged over 11 ACDM ha- 
los for the rest of the data sets is also shown. At small S the 
number of groups are dominated by the spurious groups and 
the curves lie on the predicted relationship. At large S due to 
presence of real structures the curves flatten out. The symbols 
in the figure denote the mean adopted value of Sxh for each 
data set. The distribution of stars in data set S5 is very simi- 
lar to data set S3 except for the fact that it is not over all sky. 
Hence we revise the value of Sxh for the data set S5 slightly 
from 4.55 as shown in the plot to 4.75. Note, for data sets 
with sample size less than 10^ the adopted relation is found to 
slightly underestimate the number of spurious groups but we 
ignore this. Even if this fact is taken into account this would 
only lead to a very minor revision in values of Styi for data 
sets S2 and S4 and since the corresponding curves are quite 
flat in this region the number of detected groups would also 
not change much. 

An example application of the group finder to one of the 
halos from data set SI is shown in Figure [1] The upper panel 
shows the surface density of stars in the X-Z plane while 
the lower panel shows the stars which have been identified 
as groups. It can be seen that our scheme is very successful 
in detecting the structures. Even faint structures that are not 
easily visible in the surface density maps are detected by the 
group finder. 
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EnLink is a hierarchical group finder and the reported 
groups obey a parent child relationship, with one master group 
being at the top. In our case this master group represents the 
smooth component of the halo and the rest of the groups are 
considered as substructures lying within it. Note, that since 
we analyze magnitude limited surveys we expect two density 
peak even in a smooth halo: one at the galactic center and the 
other at the location of the sun due to the presence of a large 
number of low luminosity stars. Hence we ignore all groups 
whose density peaks lie within 5 kpc of sun or the galactic 
center. 

3.2. Evaluation of clustering 

When we apply EnLink to our synthetic data sets, because 
we know the progenitor satellites from which the stars origi- 
nally came from, we can make some quantitative assessment 
of how well the group finder has performed. Following Paper 
I, we define purity as the maximum fraction of stars in a group 
that came from a single satellite. If / is the set of all satelhtes 
and J is the set of all groups, then 

PurityO>^^^^^-iil^, (12) 
n.j 

where is the number of stars from satellite i in group j and 
n,j — J2i ''^ij is the total number of number of stars in that 
group. The mean value of purity P Purity(j/)/| J| is a 
good indicator of to what extent the recovered groups corre- 
spond to real physical associations. 

4. RESULTS I: SURVEYS IN AN IDEALIZED UNIVERSE 

As a crucial step towards determining how groups recov- 
ered from a photometric survey can be related to accretion 
events in the Galaxy's history we first isolate the limitations 
imposed by the phase-mixing of de bris, rather than observa- 
tional constraints. In Section 14.11 we characterize how the 
peak density of debris — which determines whether it might 
be detectable, whether by a group-finder as a group or by 
some other means — depends on the physical characteristics 
of an accretion event. We then apply what we have learned 
to understand which progenitor satellites produce detectable 
groups in a full stellar halo (Section 14.2b and finally to out- 
line the prospects for interpreting the full distribution of group 
characteristics in terms of accretion histories (Section 14.31 1. 
For the l atter tw o sections we use the idealized data set S 1 (see 
Section 12.2. It , consisting of synthetic surveys with MSTO 
stars as tracers but the distance to the stars being assumed 
to be known exactly. 

Note that we restrict our attention to the properties of 
groups in stellar halos that correspond to unbound structures 
because the number and distribution of the structures that are 
still bound (i.e., the missing satellite problem of the ACDM 
paradigm) is still an issue of much debate. In the simu- 
lations, the star formation recipes were tuned to match the 
observed distribution of the still surviving satellite galaxies 
in the Milky Way know n at the time of the publication of 
iBuUock & JohnstonI ('2005). Hence the number of satellites 
are not a prediction of the models but rather a boundary con- 
dition. The true prediction of the simulated models is the 
number of unbound structures corresponding to the ancestral 
siblings of the present day classical dwarf spheroidal satellites 
that are formed by the dynamical interaction of these satellites 
with the potential of the host galaxy. 




log[ 4.49Le exp(-t„,J/r^] 

Fig. 3. — Ppcak in arbitrary units for all 1515 simulated accretion events 
plotted as a function of 4.49e-'^. The stars are for unbound structures 
while the triangles are for bound structures. 

4.1. Evolution of the peak density of satellite debris 

The peak density ppcak of an unbound structure evolves in 
time from an initial maximum state set by the structural prop- 
erties of the satellite before disruption to a final minimum 
state where it is fully mixed in the entire volume roughly out- 
lined by the orbit of the satellite. In the intermediate state, 
transient structures such as streams, shells and clouds are ap- 
parent. Below, we individually analyze each simulated satel- 
lite debris (a total of 1515 satellites' debris from our eleven 
stellar halo models) in order to understand the dependence of 
peak density on the properties of its progenitor satellite during 
each of these stages. 

Initially the structure is bound and its peak density Ppoak is 
observed to depend primarily upon the luminosity L of the 
progenitor satellite. The exact dependence assumed in the 
simulations is given by ppcak « and com es from fitting 
King profiles to observed dwarf galaxy data (lMatedri998j) . 
In the final, completely phase mixed state the ensemble of 
stars in the original satellite occupy a large volume V„iix and 
have peak density of order L/Vmix- In a spherical potential, 
Vmix is expected to be inversely dependent on the circularity 
of the orbit e because the debris occupies the region between 
the apocenter and the pericenter of the orbit and an eccentric 
orbit (with lower e) explores a wider range in radius. In the 
case of an axisymmetric potential the debris also occupies the 
region generated by the precession of the orbit about the axis 
of the system. Evolution in the intermediate state is governed 
by phase mixing, which occurs on a characteristic time scale 
Tmix that d epends on both the circularity £ and th e mass of 
the satelHte (lJohnstonlll998l:lHelmi & Whitdl 19991) . which in 
turn is correlated to luminosity L. In this stage, the density 
has both spatial and temporal dependence. Using the action 
angle formalism Helmi & White! (11999) analytically deduced 
the general dependence as ppcak l/{r'^t1^^) for spherical 
potentials and as ppoak ^/{r^t^cc) axis-symmetric po- 
tentials, where r is the distance of the peak density of a satel- 
lite debris from the center of the parent galaxy and tacc is the 
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time since accretion of the progenitor satellite. 

Rather than model these multiple stages and effects inde- 
pendently, we instead examine empirically to what extent the 
peak densities of debris in our simulations can be connected 
to progenitor properties with a single formula. Overall, the 
physical effects described above lead us to expect that ppoak 
follows a radial dependence and is largest for objects that 
are luminous, recently accreted and are disrupting on mildly 
eccentric orbits. We find that the relation 



provides the least scatter about the actual value ppcak mea- 
sured for the simulated satellite debris (FigureO and is a rea- 
sonable representation of trends discussed earlier. Note that in 
Figure [3] instead of associating ppcak simply with the global 
density maximum, we find the point where pr^ is a maximum, 
a choice that effectively identifies the point where the density 
contrast with respect to the background is greatest. 

4.2. Probability of detecting accretion events as groups 

The top panels of Figure |4] shows the average accretion 
history adopted in eleven ACDM model halos by plotting 
the probability of occurrence of accretion events P{E) (i.e. 
the number fraction of events of a given (L,tacc in the left 
hand panel, and of a given (e, tacc) in the right hand panel). 
Note the distribution of e is non-uniform for recent events be- 
cause we only included those that gave rise to unbound struc- 
tures — events on circular orbits (e ^ 1) remain bound for 
a longer time than those on more eccentric orbits (e ^ 0). 
For t < 8 Gyr the probability also abruptly goes to zero for 
L < 10^ L0. This is because recent accretion events with 
L < 10^ L0 are not included in the simulations. 

We can use our understanding of the evolution of the peak 
density of satellite debris to develop some intuition for which 
of the events in the top panels of Figure |4] are most likely to 
be recovered as groups. An unbound structure in a stellar halo 
is expected to be detected as a group if the ratio Ppcak/ Pb > ^ 
(where pi, is the density of the background stars, i.e., halo 
stars after excluding the stream under consideration). Us- 
ing equation [TO] this corresponds to a significance level of 
S ^ 3. The ra dial distribution of halo stars was explored 
in iBuUock & Joh nston (2005) and they found that the slope, 
dl'a{p)/d\nr, transitions between -1 for r < 10 kpc to —3.5 
for r > 50 kpc . As a first order approximation we assume 
Pb oc 1/?'^, which when combined with equation (fTsT l gives 
for the ratio of peak to background density as 

Ppcak//36 0ce"*''^'^Le (14) 

In the above equation, among the three parameters of an ac- 
cretion event, the strongest dependence is on iacci so the old- 
est accretion events have the least probability of being de- 
tected. For given tacc, events with high luminosity and on 
more circular orbits (high e) are more likely to be detected. 

We confirmed this intuition by applying the group finder to 
the surveys if the eleven ACDM models our idealized data 
set S 1 — the second row of panels in in Figure H] shows the 
fraction of accretion events in the {L, <acc) and (e, t^cc) plane 
that were recovered as groups from these surveys. Formally, 
if P{E) is the probability of occurrence of all the accretion 
events and P{SE) the probability of an event being identified 
by the group-finder in a stellar halo, then the panels plot the 
probability of detecting an event in a stellar halo, given by the 



conditional probability P(S'|i;) = P{SE)/P{E). A com- 
parison of the second and top rows in the figure demonstrates 
that nearly all events on all types of orbits and of all luminosi- 
ties are recovered for tacc < 8 Gyr. Older events on more 
circular orbits are detectable as grou ps e ven further back, as 
might be anticipated from equation (fl4] |. However, few ac- 
cretion events older than 10 Gyr ago were recovered and none 
older than 1 1 Gyr We conclude that the phase-mixing of de- 
bris imposes this fundamental limit, and photometric surveys 
alone will never be able to explore this epoch of our Galaxy's 
accretion history. 

4.3. Relating group properties to accretion history. 

Having confirmed how the recovery of a disrupted satellite 
as a group depends on its accretion characteristics, we now 
examine to what extent the distribution of group properties in 
a stellar halo reflects the distribution of properties of accreted 
objects, and hence what we might be able to say about the 
recent Galactic accretion history. 

For example, simple intuition suggests that the number of 
groups should reflect the number of recent events. The top 
row of Table |2] shows that we recovered an average of 62 
groups from our S 1 ACDM data sets with a minimum of 36 
and a maximum of 82. As anticipated, we recovered signif- 
icantly more groups from those of our non-ACDM that had 
a larger number of recent accretion events (i.e. the ones bi- 
ased towards recent accretion, or dominated by many low- 
luminosity events) and fewer groups from those that had a 
smaller number (i.e. the ones biased towards ancient ac- 
cretion, or dominated by few high-luminosity events). The 
non-ACDM stellar halo formed from predominantly circu- 
lar events also gave rise to more events as these can be de- 
tected for longer (as demonstrated in Section l4~2l and by equa- 
tion (fT4l l) 

Similarly, since accretion time is the dominant factor that 
determines the detectability of an accretion event, we expect 
the fraction of halo material detected as groups to be primarily 
related to the integrated mass accretion history of the halo and 
less sensitive to the total number of such accretion events. The 
top row of Table |3] shows that the fraction of stars in groups 
varies widely for our S 1 ACDM data sets with a minimum of 
less than 2% and a maximum over 20%: the total fraction is 
heavily influenced by the most luminous object accreted on to 
the halo, and these are very few in number The recently ac- 
creted halo (i.e., the young halo) shows the maximum fraction 
in groups as the structures have not yet dispersed. In spite of 
the total number of accretion events being very different, the 
low luminosity and high luminosity halos have similar frac- 
tions because for both of them a similar amount of mass was 
accreted within the last 10 Gyr. The circular halos, which can 
probe much older events, have higher fractions compared to 
the radial halos. 

We would also like to know the luminosity function of ac- 
creted objects, which should be reflected in the distribution 
of group stellar masses mgioup- The thin black lines in the 
top left panel of Figure |5] show the normalized, cumulative 
distribution of group masses recovered from our 1 1 ACDM 
— we found that our non-ACDM halos that were constructed 
from events that had a ACDM luminosity function (i.e. the 
old/recent and circular/radial halos) all lay within the region 
occupied by the -ACDM lines (i.e. they had the same group 
mass function). In contrast, the brown line in this panel shows 
the group mass function for the high-luminosity non-ACDM 
halo, which is clearly distinct from the ACDM Unes and bi- 
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Fig. 4. — Probability distribution of detecting accretion events in the (L, tacc) and (e, tacc) parameter space for various photometric surveys. Only events 
giving rise to unbound structures are shown. The type of data set is labeled on each panel, further details on labels can be found in Table[T] The color scheme is 
such that the darkness increases monotonically with probability. The panels labeled all events shows the probability distribution of all the accretion events and is 
plotted in logarithmic units (see color bar). The other columns show the conditional probability of detecting the events in various data sets given the probability 
of all the events and it is plotted in linear units (see color bar). 
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Fig. 5. — Normalized cumulative distribution of stellar mass (number of 
stars in a group) in groups discovered by the group-finder for various pho- 
tometric surveys. The curve coiTesponding to high luminosity halo is easily 
distinguishable from other halos. 



ased towards higher mass groups. While the corresponding 
results for the low-luminosity halo are less striking in this 
panel (yellow line), Table [2] already demonstrates that the 
difference in luminosity function could be seen in the far 
larger number of groups recovered from this halo than for the 
ACDM halos. 

The last unknown in the accretion history is the orbit distri- 
bution. Since eccentric orbits with low e have larger apocen- 
ters than more circular orbits of the same energy and a larger 
fraction of time during an orbit is spent near the apocenter 
rather than the pericenter, groups formed from radial accre- 
tion events are expected to be detected further away from the 
Galactic center The probability distribution of radial distance 
of groups for data set S 1 shown in top left panel of Figure |6] 
demonstrates this. The black lines are again for the ACDM 
halos. The radial (purple) and the circular (cyan) halo profiles 
clearly outline the black lines. More than 2/3 of the groups in 
the radial halo are above lOOkpc while 2/3 of the circular halo 
groups are below 60 kpc. For the ACDM halos about 30% of 
the groups lie above 100 kpc. 

In summary, our analysis here using data set SI suggests 
that, if the distance of the stars is known accurately, then 
number of groups, amount of material in substructures and 
the probability distributions of masses and radial distance can 
be related to the number, luminosity function and orbit dis- 
tribution of recent accretion events. The remaining question 
is whether the accretion history of a halo can be constrained 
with real observational data. 
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Fig. 6. — Normalized cumulative distribution of radial distance (with re- 
spect to the Sun) of groups discovered by the group-finder for various photo- 
metric surveys. The distance of a group is defined as the heliocentric distance 
to the point within a group that has maximum number density of stars. The 
distributions of radial and circular halos are easily distinguishable in the plots. 
Note, for the simulated halos the plotted distance is the actual distance and 
not the one infeired from properties of stars. 



TABLE 2 

Number of recovered groups for various surveys 
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4.4. Fraction of material in groups 

Our results (top row TableO suggest that about 9% of ma- 
terial is in unbound recovered groups. As mentioned earlier 
fraction of material in groups is found to vary greatly from 
halo to halo and this is because it is typically dominated by a 
few luminous accretion events. A significant fraction of mate- 
rial is also contained in bound groups. When these are taken 
into account the fraction is found to rise to 36% (last four 
columns of top row in Table O. Nevertheless, the fraction of 
material in unbound groups is still quite small. To explore 
this further, we plot in Figure |7] the fraction of material in re- 
covered groups as a function of galactocentric radius (black 
solid line). The contributions of bound and unbound groups 
are also shown separately (blue and dark green solid lines). 
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TABLE 3 

Fraction of material in recovered groups for various surveys 





Unbound groups only 


All groups, both bound and unbound 
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0.07 


2e-3 


0.20 
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0.18 
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0.41 
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0.16 
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0.03 
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0.08 
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0.33 


0.30 


S5 


3e-5 
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0.10 





le-2 


0.04 
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Fig. 7. — The fraction of material recovered as groups as a function of 
galactocentric radius for the idealized survey S 1 (averaged over 1 1 ACDM 
halos). The fraction of material for bound and unbound groups is shown 
separately. The dotted curves show the actual fraction of material in bound 
and unbound structures in the simulations. Note, the recovered and actual 
fraction of material in bound structures ai'e indistinguishable in the plot. 

To create the plots the data points were tagged as recovered 
or un-recovered and bound or unbound and then binned by 
radial galactocentric distances. The dashed lines show the 
actual fraction of material in bound and unbound structures 
in the simulations. For bound groups the actual fraction is 
indistinguishable from the recovered fraction, which simply 
illustrates that all bound groups are correctly recovered by 
our clustering scheme. The dashed lines, i.e., the actual frac- 
tions, show that outer halo (beyond 120 kpc) is predominantly 
bound whereas the inner halo (less than 60 kpc) is predom- 
inantly unbound. Next we look at the recovered fractions. 
The /unb°cmiid^(^) Quitc low in the inner regions and then 
rises slowly and peaks at around 100 kpc. This rise coincides 
with a fall of fraction f{,l''°;^"'"^{r). This is due to the fol- 
lowing two facts a) low e (circularity of orbit) are more likely 
to be unbound than circular events since they pass close to 
the center, b) low e events are more likely to be found at the 
apocenter and their apocenter is further away than that of an 
high e event of similar energy. The black line representing 
/boimd+unbound(^) shows that the outer halo (beyond 60 kpc) 
is highly structured whereas the inner halo (less than 40 kpc) 
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Fig. 8. — Detection probability of accretion events as a function of time for 
different data sets. The detection probability was computed by binning the 
accretion events along time in bins of 1 Gyr and subsequently computing the 
ratio of the number of unique accretion events detected by the group finder to 
the total number of events in a bin. Data from all 1 1 ACDM halos was used. 

which contains about 50% of the material (as shown by the 
red line) has very little material which can be recovered as 
structures. This implies that in the inner regions strong phase 
mixing greatly limits the amount of material that can be re- 
covered as structures in 3-d configuration space. The analysis 
also suggests that for a given survey the fraction of material 
in groups will depend sensitively upon the contribution of the 
inner halo to the survey, which in turn is determined by the 
geometry and the depth of the survey. 

A few other issues which can affect the fraction of material 
in groups are as follows. We have here considered only the 3- 
d configuration space, additional information in the form of 
velocities should help detect more groups and increase the 
fraction of material in groups. Also, the choice of photo- 
metric selection function can decrease or increase the fraction 
of material in groups depending upon the contribution of the 
smooth component to the sample of stars in the survey (see 
Section l572l for further details). Issues related to our choice of 
the clustering scheme which can affect the fraction of material 
in groups are discussed in Section l6^ 

5. RESULTS II: SURVEYS IN A REALISTIC UNIVERSE 

In this section we apply the group-finder to our more real- 
istic synthetic surveys of stellar halos (data sets S2-S5), com- 
pare their sensitivity to different accretion events and assess 
their ability to distinguish accretion histories. We apply the 
group-finder with fc = 30 and selecting 5Th according to 
equation (fTTl l to account for the difference in the sample sizes, 
to all seventeen stellar halo models within each data set. 

5. L The effect of stellar populations 

Figure |8] summarizes our results by plotting the number 
fraction of accretion events recovered as a function of accre- 
tion time, where each symbol represents a 1 Gyr interval. As 
expected, our idealized survey S 1 recovers the largest fraction 
of events at all times, and the SDSS MSTO survey (S5), which 
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Fig. 9. — Sampling probability of accretion events in the (L, tacc) param- 
eter space for different photometric surveys relative to data set S 1 . IDifferent 
data sets are labeled on the plot. 

covers less than 1 % of the volume of any of the other surveys, 
recovers the fewest. However, there is no simple answer to the 
question of which survey is most sensitive. Our synthetic data 
sets represent surveys of different stellar tracers (i.e. M-giant, 
MSTO or RR Lyrae stars) selected with different observing 
strategies. As a result, these data sets explore the space around 
the Galaxy with a variety in the numbers of stars, dept hs and 
accuracies in distance estimates, as outlined in Section 12.2.11 
and summarized in Table[T] In addition to these differences in 
spatial exploration, systematic differences in the stellar popu- 
lations of objects of different luminosities and accretion times 
(e.g. as reflected in the stellar-mass/metallicity relation for 
Local Group dwarfs, see Larson 1974) means that the choice 
of tracers affects the relative number of stars contributing to 
each data set from different accretion events. 

Figure|9]illustrates this effect by plotting the sampling prob- 
ability of accretion events in the L-ts.cc plane relative to data 
set SI- the sampling probability being computed as the prob- 
ability distribution of stars appearing in a survey in the L-ts.cc 
plane (the L and tacc of a star being that of its parent satellite). 
The top panel, for data set S2, clearly shows the expected bias 
of M-giants towards tracing the highest metallicity and hence 
highest luminosity events. In addition, M-giants are interme- 
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Fig. 10. — Sampling probability of accretion events in the (L, tacc) pa- 
rameter space for different photometric surveys relative to data set S 1 . Dif- 
ferent data sets are labeled on the plot and are subsamples of data set S 1 . The 
plots shows the effect of magnitude limits and latitude Umits on the sampling 
probability. The top plot is for latitude restricted to 6 > 37.74, the middle 
panel is for SDSS r band magnitude Umited to r < 22 and the bottom panel 
is for both r < 22 and b > 37.74°. 

diate age stars, which means that such surveys do not contain 
stars from ancient accretion events. 

The second two panels show the sampling probability for 
our deep MSTO surveys S3 and S3', which differ only in the 
color range from which stars are selected. Both surveys con- 
tain stars from lower luminosity objects and earlier accretion 
times than the S2 surveys. A comparison of the two shows 
that as the red edge of the color limit increases, the sampling 
probability increases for old and high luminosity events (up- 
per right hand region in the plots) and decreasing the red edge 
of the color limit has the opposite effect, i.e., sampling prob- 
ability increases for recent and low luminosity events (lower 
left hand region in the plots). This is because a) the blue edge 
of the MSTO stars in an isochrone (knee shaped feature in the 
color magnitude diagram) shifts red-wards with the increase 
in age and metallicity of the stars and b) the high luminosity 
events are also metal rich. Increasing the photometric errors 
has an effect similar to increasing the color range. This IS 
the reason why the sampling probability for high luminosity 
events is slightly higher for S3 as compared to SI although 
both surveys have the same color limits. 

The fourth panel, for the RR Lyrae survey (data set S4), 
shows the strongest bias towards old and low-luminosity 
events as RR Lyraes are old, low-metallicity stars. 

The bottom panel, for the MSTO samples with SDSS sky- 
coverage and magnitude limits (i.e. data set S5), indicates that 
stars from recent, low-luminosity events and ancient, high- 
luminosity events are poorly represented in this survey com- 
pared to the SI sample. Recall that SI surveyed all-sky to 

' Increasing the photometric errors, also results in an increase in contam- 
ination from red dwarfs, which have low magnitude and hence high photo- 
metric errors. 
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greater depth (r=24.5) than S5, with the same color cut, but as- 
suming perfect knowledge of distances. Figure [TOl shows the 
sampling probability for surveys like SI but with an SDSS- 
like latitude (top panel) and magnitude (middle panel) cut. 
Combining the two cuts (bottom) panel reproduces the bulk of 
the features in the lower panel of Figure|9]which demonstrates 
that accretion events are "missing" from the SDSS sample 
simply because of the small sample volume rather than errors 
in distance estimates. 

Overall, we conclude that the variety of observing strate- 
gies and tracers used in current and future surveys can lead to 
a range in the contrast that a given structure may have with re- 
spect to the smooth background. This suggests that different 
surveys will be sensitive to different types of accretion events. 

5.2. Events recovered as groups 

The last five rows in Figure |4] show the fraction of events 
recovered as groups from our S2, S3, S3', S4 and S5 surveys 
of the ACDM stellar halo models, which can be broadly un- 
derstood given the survey characteristics outlined in Table [T] 
and Section lSTI 

For example, relative to the other surveys, the S2 (M-giant) 
data sets contain a low number of stars, that explore only out 
to ~ 100 kpc from the center of the Milky Way with modestly 
accurate distance estimates and a bias towards intermediate 
age, high-metallicity populations. These effects combined, 
explain the absence of groups in the S2 panels of Figure|4]de- 
tected from events that were: old (M-giant stars are interme- 
diate age); low luminosity (M-giant stars are relatively high 
metallicity); or on eccentric orbits (with apocenters beyond 
100 kpc where debris spends most of its time). However, de- 
spite the relatively low number of stars overall, these surveys 
are particularly well-suited to finding signatures of high lu- 
minosity, relatively recent accretion events: these stand out 
in M-giants at high surface brightness (because of the high- 
luminosity and recent accretion of the progenitor) and better 
contrast to the background (since ancient events which form 
a smooth background are not represented). 

Data set S3 (MSTO stars) explores a similar volume to the 
S2 surveys, but contains a far greater number of stars, with 
less accurate distance estimates and a wider variety in the 
ages and metallicities of stars. Overall Figure |4] demonstrates 
that these MSTO surveys can recover additional groups from 
lower luminosity and more ancient accretion events (because 
the survey contains lower metallicity and older stars), as well 
as those on more eccentric orbits (because the large distance 
errors mean that a significant fraction of stars scatter into the 
sample from beyond 100 kpc). 

Data set S3' is constructed the same way as S3, but with a 
finer color cut. This lowers the total number of stars, increases 
the accuracy of the distance estimates but decreases the con- 
tribution from older and higher metallicity stars. Since the 
last property decreases the contribution of the smooth back- 
ground, the S3' surveys do even better than the S3 surveys at 
recovering groups corresponding to low luminosity events. 

Data set S4 — the RR Lyrae surveys — contain only a com- 
parable number of stars to the M-giants surveys, but explore 
to far greater depth and with much more accurate distances 

* Note that we found that decreasing the photometric errors — as possible 
with co-added LSST data — emphasizes the trends found with making a finer 
color cut as there is less scatter in the colors. However, it is hard to assess the 
importance of this consideration given the simplicity of the stellar populations 
in our models. 



than S2, S3 or S3', and represent old, low-metallicity popula- 
tions. Figure |9] shows that stars from intermediate-luminosity 
satellites are missing from the survey, and Figure |4] confirms 
that these events are not recovered as groups. However, S4 
has a similar success as S3 and S3' in filling out other regions 
of accretion history space. 

Finally, data set S5 — the SDSS MSTO surveys — con- 
tains the same stellar populations as S3, but explores a much 
smaller volume because of the brighter magnitude limit. A 
comparison of the bottom panels of Figure |4] with and Fig- 
ure |9] demonstrates how this shallower magnitude limit ef- 
fectively eliminates sensitivity to many recent events which 
do not contribute significantly to the inner halo. Similarly, 
the bottom right-hand panel of Figure |4] shows that the group 
finder recovers no events on eccentric orbits that are likely to 
have debris beyond the edge of the SDSS survey. 

5.3. Mapping accretion history with combined surveys 

Recall that in our idealized surveys we found that number 
of groups (Table 121), amount of material in substructures (Ta- 
ble [3]) and the probability distributions of group masses (Fig- 
ure|5j and radial distance (Figure|6l) can be related to the num- 
ber, luminosity function and orbit distribution of recent accre- 
tion events. If we now account for the sensitivity of the differ- 
ent surveys to accretion events of different properties outlined 
in the previous two sections we can examine how such sur- 
veys could combine to present a picture of recent accretion 
history. Note that, because of the low number of groups re- 
covered, we have not included the results for the distribution 
of group properties in data set S5 in Figure|5]and Figure|6l 

For example, our S2 M-giant surveys are not expected to 
be sensitive to low luminosity events or those on very radial 
orbits. This insensitivity accounts for why Table|2]shows that 
almost no groups were recovered from these surveys of our 
mock-halos built entirely from radial orbit events, and why 
there is not a striking difference in the number of groups 
recovered from our low-luminosity model and our standard 
ACDM models. However, the S2 panel in Figure |5] clearly 
shows that our high-luminosity model halo gave rise to a 
much larger fraction in massive groups than our ACDM mod- 
els. Hence, we expect that 2MASS can already provide a rea- 
sonable census of recent accretion events from the upper end 
of the luminosity-function. 

Previous results for our S3 and S3' data sets suggest that 
information about the lower end of the luminosity function 
of recent accretion events might be filled in by these future 
MSTO surveys. Indeed, Table |2] indicate that these surveys 
detect a larger number of groups in our low-luminosity halo, 
which contained a larger number of low-luminosity accretors, 
when compared to our standard models. This indicates that 
MSTO surveys should recover debris from accreted objects 
— like the lower luminosity classical dwarf spheroidal and 
ultra-faint galaxies — that would be missed by 2MASS. 

Since they explore to far greater depth, future RR Lyrae 
surveys (S4) will have unique power to probe the outer halo, 
beyond 100 kpc and assess the fraction of debris that lies at 
these large radii. This power is demonstrated by the very clear 
difference in the S4 panel of Figure |6]between the radial dis- 
tribution of groups found in the RR Lyrae survey of our radial 
orbit halo (magenta line) compared to the ACDM cases (thin 
black lines). This difference implies that RR Lyrae surveys 
will flesh out our picture of the orbit distribution of accretors 
onto our Galaxy. Additionally they are also very efficient at 
detecting the low luminosity accretion events. 
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t„cc Gyr 

Fig. 11. — Detection probability of accretion events restricted to L > 
10^ Lq and high circularity (e > 0.4 or e > 0.8). Shown are the probabil- 
ities for data set SI and S2. It can be seen that circular events (e > 0.8 as 
compared to e > 0.4) can be detected much further back in time. Compari- 
son of the curves for S2 with that of S 1 illustrates that the 2MASS survey is 
almost complete for high luminosity and high circularity events. 

5.4. Limitations 

The synthetic surveys that we analyze do not include other 
galactic components, e.g., the disk and the bulge. However, 
this will have little impact on the analysis we present here 
because a) the volume of the stellar halo that we explore (100 
kpc and beyond) is much larger as compared to the volume of 
the disc, b) most stellar halo structures lie away from the disc 
and bulge regions. Note, even in the present analysis we have 
neglected grou ps th at lie within 5 kpc of sun or the galactic 
center (Section T3.1b . Another thing that we have neglected is 
the extinction which will make the detection of halo stars in 
the plane of the disc very difficult. In realistic surveys one 
might have to filter out the low lying latitudes as was done for 
real 2MASS data in Paper-1. However an analysis of S2 data 
after removing the low latitude regions revealed that this has 
little impact on the number of reported groups or the ability to 
discern different accretion history. In general a few structures 
will be missed while a few others will be split and discovered 
as two groups. 

For the MSTO surveys we have neglected the possibility of 
contamination by quasars (QSO). They are expected to con- 
taminate towards the blue color Umits. In the co lor range 
0.2 < g - r < 0.4 and r < 21.5 lJuric et al.l (120081) estimate 
the contamination by QSO's to be 10%. Since QSO's would 
be distributed isotropically and in a smooth fashion they will 
not give rise to any false structures but due to their contribu- 
tion to the background they might decrease the significance of 
some of the faint structures making them difficult to identify. 

A final limitation of our analysis is the fact that recent ac- 
cretion events with L < 10^ L© are not included in the stellar 
halo simulations that we use here. This is because the star for- 
mation recipes in the simulations were fine tuned to reproduce 
the observed distribution of satellites known at the time of the 
publication of Bullock & Johnston (2005), and the ultra faint 
galaxies were discovered later on. Our analysis shows that 
the MSTO and RR Lyrae surveys are sensitive to low lumi- 
nosity events hence we anticipate such surveys to detect more 
structures than that reported by us. 

6. APPLICATIONS TO THE REAL UNIVERSE 
6. 1 . The 2MASS M- giant sample 

In Paper I we identified structures in the 2MASS M-giant 
sample and found sixteen groups, seven of which corre- 
sponded to known bound structures, one was probably a part 
of the disk and two others were due to masks employed in the 
data. Excluding these, we found six unbound groups. 

The 2MASS results in Table |2] and Table [3] and in bold 



dashed lines in Figure|5]and Figure|6]provide a detailed com- 
parison between the properties of these observed groups and 
those recovered from the synthetic M-giant surveys (data set 
S2) of our eleven ACDM models (thin black lines). Overall, 
we find broad consistency between the observations and mod- 
els. In particular, the total number of groups (six) sits close to 
the middle of the range found in the models (8.4± 8). 

Note the results reported for data set S2 are without any se- 
lection by latitude. We also repeated the analysis by applying 
the same latitude limits used to generate our 2MASS M-giant 
sample but did not find a significant change in the mean num- 
ber of recovered groups for ACDM halos, or in the figures. 
However, the fraction of material in groups was found to be 
around 30%, which is about 3 times higher than that reported 
in Table [3] (unbound groups). This is because much of the 
stellar halo mass is concentrated towards the center and the 
plane of the galaxy. This mass near the center of the galaxy 
does not contain any substructures and its exclusion results 
in an increase in the mass fraction of recovered groups. On 
the other hand in the real 2MASS M-giant data we find the 
mass fraction in groups to be 4.5%, which is quite low. The 
reason for this is that in the real 2MASS data nearly 75% of 
the stars are due to the LMC which is a bound satellite system 
and this is quite un-typical for a ACDM halo. For comparison 
in our simulated halos the maximum fraction of mass in a sin- 
gle bound satellite was 20%. Excluding LMC, increases the 
fraction of material in groups to 18%, which is more typical 
of what we see in simulated halos. 

We can also use our synthetic surveys to ask whether the 
2MASS groups are likely to correspond to real accretion 
events and, if so, what type of events. Table 1 shows that 
the typical purity of groups recovered from the simulated ha- 
los in data set S2 was high, which supports the interpretation 
of the groups in 2MASS as real physical associations. More- 
over, the results from Section|5]suggest that these groups cor- 
respond to recent (< 10 Gyr ago), high-luminosity accretion 
(> 5 X 10^ Lq) events on orbits whose apocenters lie within 
100 kpc. 

Finally, we can test what the 2MASS groups can contribute 
to our picture of accretion history by asking what fraction of 
accreting objects we expect to detect as groups in this data set. 
Figure[8fsuggest the groups represent 60% of objects accreted 
within the last 6 Gyr, falling to as little as 10% of those ac- 
creted 7.5-8.5 Gyr ago and 0% of those with accreted earlier 
than 11.5 Gyr ago. While these numbers clearly portray the 
weakness of the M-giants as tracers of accretion history, they 
do not capture their strength. Figure [TT] repeats Figure [H but 
this time restricting attention to events that we know contain 
M-giants within the apparent magnitude range of the 2MASS 
survey — those that have high-luminosity (> lO^L©) and are 
on mildly-eccentric orbits (e > 0.8). 100% of these events 
are recovered that were accreted within the last 8 Gyr, and 
more than 50% with accretion times less than 10 Gyr ago. We 
conclude that the structures in 2MASS give us a fairly com- 
plete census of recent, massive accretion events along mildly- 
eccentric orbits. 

6.2. SDSS MSTO sample 

While a full application of our group finder to the SDSS 
MSTO sample is beyond the scope of the current work, 
analysis of our S5 data sets suggest we might expect to 
find 0-4 groups, corresponding to relatively high-luminosity, 
intermediate-age accretion events on mildly eccentric orbits. 
The S2 and S5 panels of Figure |4] imply that the major- 
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ity (though not all) of these events would also be recovered 
from the 2MASS M-giant sample. Indeed, 4 groups that are 
plausibly associated with the ancestral siblings of the clas- 
sical dwarf spheroidal satellites are apparent through visual 
inspection of SDSS MSTO sample (the tails from disruption 
of Sagittarius, the Monocerous ring and the Virgo oyerden - 
sity, and the Orphan Stream, see e.g. iBelokurov et aLll2006l) . 
and two of these are also seen in the M-giants. This again 
indicates broad consistency between structures seen in the 
real stellar halo and those in our models. (While many more 
streams from globular clusters and lower-luminosity dwarfs 
hav e been found in S PSS using matched-filtering techniques, 
e.g.. lGrillmairll2009l these objects are missing from the stel- 
lar halo models, so we compare only the number of higher- 
luminosity streams) 

From Table [3]it can be seen that, for data set S5, the mean 
fraction of material in groups is 4% which increases to 6% 
when bound groups are also included. These results are con- 
sistent with what we see for S3 survey, if we take into account 
the shallow depth of the S5 survey due to which the outer 
halo, which is also more structured (see Figure |7]i, is missed 
. Similarly, most bound structures are in the outer parts of 
the halos (see Figure |7]i, hence unlike other surveys, includ- 
ing them does not increase the fraction by m uch for S5 survey . 
However, when compared to the results of Bell et al.l (120081) . 
where they find the fractional rms deviation from a smooth 
analytic model, cr/total, to be around 40% for SDSS MSTO 
stars, our mass fractions are apparently low. It should be noted 
that Bell et al. (2008) had also analyzed the same set of halos 
as used by us and reported good agreement with the SDSS 
data. Hence the cause of the mismatch is due to the method- 
ologies being different, which we explore below. Firstly, we 
think that cr/total cannot be directly interpreted as amount of 
mass in structure, e.g., considering equally populated bins, if 
10% of the bins differ in mass by order of the mass in the 
bins one gets cr/total = 0.33. Secondly, an analytical model 
as adopted by them might not necessarily be a good descrip- 
tion of the smooth component of the halo. This misfit will 
contribute to cr/total but will not give rise to any structure in 
our scheme. It is also important to note that in our cluster- 
ing scheme only those structures are detected which give rise 
to peak in the density distribution. Thirdly, the group finder 
truncates a structure when its isodensity contour hits that of 
another structure. Hence the envelope region around a struc- 
ture can contribute to cr/total but is ignored by us. Note, by 
following points in the envelope regions along density gradi- 
ents one can associate them with the groups but such a classifi- 
cation is not free from ambiguity and in general decreases the 
purity of the groups. Finally, in our analysis we only consider 
significant groups which stand out above the Poisson noise. 
However, there might be low significance fluctuations which 
we have deliberately ignored and these will invariably con- 
tribute to cr/total. However, low significance fluctuations un- 
less they are massive, which is rare, wont dominate the mass 
fraction in structures. To conclude we think that the method- 
ologies being very different it is very difficult to interpret the 
results of one scheme in terms of the other 

7. SUMMARY 

In this paper, we have explored the power of a group find- 
ing algorithm to recover structures from photometric surveys 
of the stellar halo and interpret their properties in terms of 
Galactic accretion history. 

We first applied our group finder to idealized synthetic stel- 



lar surveys, which were generated from our ACDM models 
without accounting for observational errors. We find a sim- 
ple dependence for the probability of detecting debris as a 
group on the parameters of its progenitor accretion event: the 
probability is highest for recent (small tacc) and high lumi- 
nosity (large L) events accreted along circular orbits (large e). 
The strongest dependence is on tacc- The properties of re- 
covered groups — the number of and fraction of material in 
groups, along with distribution of the stellar mass and radial 
distance of the groups — can in principle place constraints 
on the recent accretion history of a halo. Ancient accretion 
events (> 10 Gyr ago) are not recovered as groups even in 
the absence of observational limitations because they are too 
phase-mixed to appear as distinct structures. 

We then applied our group finder to synthetic surveys that 
contained more realistic observational errors. Our results em- 
phasize that the capability of a photometric survey to discover 
structures depends upon its sample size, the distance uncer- 
tainty, the depth and the relative sampling probability of dif- 
ferent stellar populations. The broadest constraints on accre- 
tion history will come by combining the results of current and 
future surveys. For example, M-giants selected from 2MASS 
are intermediate age, high metallicity stars and hence are good 
tracers of relatively recent, high-luminosity accretion events 
with little contamination from older events. An LSST MSTO 
survey would contain a range of stellar populations whose 
properties depend on the severity of the color cut made to se- 
lect the sample. Limiting the sample to the very bluest MSTO 
stars increases the dominance of low-metallicity stars in the 
sample, and hence the sensitivity to low-luminosity accretion 
events. RR Lyraes selected from LSST are bright enough to 
probe beyond 100 kpc where the apocenters of the more ec- 
centric orbits lie, and hence will find a more fair sampling of 
the orbital properties of accretion events. 

Finally, a quantitative comparison of the results of apply- 
ing the group-finder to the real 2MASS M-giant sample with 
those from the mock (synthetic) 2MASS M-giant surveys 
shows the number and properties of substructures in 2MASS 
M-giant survey to be roughly in agreement with simulated 
ACDM stellar halos. These groups most likely correspond 
to satellites accreted more recently than 10 Gyr, with lumi- 
nosity higher than 5x10^ and preferably on orbits of low 
eccentricity. 

Overall we conclude that current and near-future photomet- 
ric surveys are poised to provide a complete census of our 
Galaxy's recent accretion history. The current results from 
2MASS alone map the highest-luminosity recent events, fu- 
ture deep MSTO surveys will fill in the lower end of the lumi- 
nosity function and RR Lyrae surveys will find debris struc- 
tures that may be currently missing because the progenitor 
satellites were on highly radial orbits. Reconstructing more 
ancient accretion will require additional dimensions of data, 
such as velocity (proper motions and radial velocities of stars) 
and chemical abundance information. 
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APPENDIX 
EFFECT OF SUB-SAMPLING 

In Section 12.2.21 we had mentioned that for computational ease we had sub-sampled the data sets SI and S3 by a factor of 
0.25. This sub-sampling resulted in a sample size of about 10^ while the actual sample size would have been 4 x 10^. To 
check if sub-sampling has any effect on the results of our group finding analysis, we ran the group finder on the eleven ACDM 
halos with ten and 100 times lower number of stars, i.e., sample sizes of 10^ and 10^ respectively. The parameter Sth was set 
according to equation ( fTTT i. For data set SI lowering the resolution by ten and 100 times resulted in a reduction in the number 
of detected groups by 14% and 64% respectively while the purity was found to be remain approximately constant. This suggests 
that increasing the sample size from 10^ to 10^ results in significant improvement in clustering performance but beyond 10*^ the 
performance tends to saturate. This saturation of clustering performance is partly due to the fact that in the simulations there is a 
finite lower limit on the luminosity of the satellites that are simulated. Similar trends were also seen for the data set S3- lowering 
the resolution by 10 times resulted in a reduction in number of groups by 10%. Hence we conclude that choosing /sample = 1 
instead of 0.25 as done by us should only give a moderate improvement in clustering performance. 



